林嶔 (Lin, Chin)
Lesson 22
– 但回到我們的手寫數字分類問題,當我們看到這些手寫數字時,我們一眼就能認出他們了,但從「圖片」到「概念」的過程真的這麼簡單嗎?
– 他們的研究發現,貓咪在受到不同形狀的圖像刺激時,感受野的腦部細胞會產生不同反應
– 卷積器模擬了感受野最初的細胞,他們負責用來辨認特定特徵,他們的數學模式如下:
– 「特徵圖」的意義是什麼呢?卷積器就像是最初級的視覺細胞,他們專門辨認某一種簡單特徵,那這個「特徵圖」上面數字越大的,就代表那個地方越符合該細胞所負責的特徵。
獲得特徵圖之後,還記得我們為了增加神經網路的數學複雜性,會添加一些非線性函數做轉換,因此卷積神經網路在經過卷積層後的特徵圖會再經過非線性轉換。
接著,由於連續卷積的特徵圖造成了訊息的重複,這時候我們會使用「池化層」(pooling layer)進行圖片降維,事實上他等同於把圖片的解析度調差
原始圖片(28x28x1)要先經過20個5x5的「卷積器」(5x5x1x20)處理,將使圖片變成20張「特徵圖」(24x24x20)
接著這20張「特徵圖」(24x24x20)會經過非線性轉換,產生20張「轉換後的特徵圖」(24x24x20)
接著這20張「轉換後的特徵圖」(24x24x20)再經過2x2「池化器」(2x2)處理,將使圖片變成20張「降維後的特徵圖」(12x12x20)
– 我們想像有一張人的圖片,假定第一個卷積器是辨認眼睛的特徵,第二個卷積器是在辨認鼻子的特徵,第三個卷積器是在辨認耳朵的特徵,第四個卷積器是在辨認手掌的特徵,第五個卷積器是在辨認手臂的特徵
– 第1.2.3張特徵圖中數值越高的地方,就分別代表眼睛、鼻子、耳朵最有可能在的位置,那將這3張特徵圖合在一起看再一次卷積,是否就能辨認出人臉的位置?
– 第4.5張特徵圖中數值越高的地方,就分別代表手掌、手臂最有可能在的位置,那將這2張特徵圖合在一起看再一次卷積,是否就能辨認出手的位置?
– 第4.5張特徵圖對人臉辨識同樣能起到作用,因為人臉不包含手掌、手臂,因此如果有個卷積器想要辨認人臉,他必須對第1.2.3張特徵圖做正向加權,而對第4.5張特徵圖做負向加權
– 這是一張鸚鵡的圖片
library(imager)
img <- load.image(system.file("extdata/parrots.png", package="imager"))
gary.img <- grayscale(img)
plot(gary.img)
– 我們試著用一個特殊結構的卷積器取得他的特徵圖吧!
conv.filter.1 = matrix(c(-1, -1, -1,
-1, +8, -1,
-1, -1, -1), nrow = 3)
img.matrix = as.matrix(gary.img)
feature.img = matrix(NA, nrow = nrow(img.matrix) - 2, ncol = ncol(img.matrix) - 2)
for (i in 1:nrow(feature.img)) {
for (j in 1:ncol(feature.img)) {
sub.img.matrix = img.matrix[0:2+i,0:2+j]
feature.img[i,j] = sum(sub.img.matrix * conv.filter.1)
}
}
new.img = as.cimg(feature.img)
plot(new.img)
– 請你試著使用其他卷積器來試試看吧?
conv.filter.2 = matrix(c(-1, 0, +1,
-2, 0, +2,
-1, 0, +1), nrow = 3)
– 再次複習一下他的資料結構
DAT = read.csv("data/train.csv")
#Split data
set.seed(0)
Train.sample = sample(1:nrow(DAT), nrow(DAT)*0.6, replace = FALSE)
Train.X = DAT[Train.sample,-1]/255
Train.Y = DAT[Train.sample,1]
Test.X = DAT[-Train.sample,-1]/255
Test.Y = DAT[-Train.sample,1]
#Display
library(imager)
par(mar=rep(0,4), mfcol = c(4, 4))
for (i in 1:16) {
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
img = as.raster(t(matrix(as.numeric(Train.X[i,]), nrow = 28)))
rasterImage(img, -0.04, -0.04, 1.04, 1.04, interpolate=FALSE)
text(0.05, 0.95, Train.Y[i], col = "green", cex = 2)
}
Train.X.array = t(Train.X)
dim(Train.X.array) <- c(28, 28, 1, nrow(Train.X))
Test.X.array <- t(Test.X)
dim(Test.X.array) <- c(28, 28, 1, nrow(Test.X))
library(mxnet)
# input
data <- mx.symbol.Variable('data')
# first conv
conv1 <- mx.symbol.Convolution(data=data, kernel=c(5,5), num_filter=20)
relu1 <- mx.symbol.Activation(data=conv1, act_type="relu")
pool1 <- mx.symbol.Pooling(data=relu1, pool_type="max",
kernel=c(2,2), stride=c(2,2))
# second conv
conv2 <- mx.symbol.Convolution(data=pool1, kernel=c(5,5), num_filter=50)
relu2 <- mx.symbol.Activation(data=conv2, act_type="relu")
pool2 <- mx.symbol.Pooling(data=relu2, pool_type="max",
kernel=c(2,2), stride=c(2,2))
# first fullc
flatten <- mx.symbol.Flatten(data=pool2)
fc1 <- mx.symbol.FullyConnected(data=flatten, num_hidden=500)
relu3 <- mx.symbol.Activation(data=fc1, act_type="relu")
# second fullc
fc2 <- mx.symbol.FullyConnected(data=relu3, num_hidden=10)
# loss
lenet <- mx.symbol.SoftmaxOutput(data=fc2)
– 第一層卷積組合
原始圖片(28x28x1)要先經過20個5x5的「卷積器」(5x5x1x20)處理,將使圖片變成20張「一階特徵圖」(24x24x20)
接著這20張「一階特徵圖」(24x24x20)會經過ReLU,產生20張「轉換後的一階特徵圖」(24x24x20)
接著這20張「轉換後的一階特徵圖」(24x24x20)再經過2x2「池化器」(2x2)處理,將使圖片變成20張「降維後的一階特徵圖」(12x12x20)
– 第二層卷積組合
再將20張「降維後的一階特徵圖」(12x12x20)經過50個5x5的「卷積器」(5x5x20x50)處理,將使圖片變成50張「二階特徵圖」(8x8x50)
接著這50張「二階特徵圖」(8x8x50)會經過ReLU,產生50張「轉換後的二階特徵圖」(8x8x50)
接著這50張「轉換後的二階特徵圖」(8x8x50)再經過2x2「池化器」(2x2)處理,將使圖片變成50張「降維後的二階特徵圖」(4x4x50)
– 全連接層
將「降維後的二階特徵圖」(4x4x50)重新排列,壓製成「一階高級特徵」(800)
讓「一階高級特徵」(800)進入「隱藏層」,輸出「二階高級特徵」(500)
「二階高級特徵」(500)經過ReLU,輸出「轉換後的二階高級特徵」(500)
「轉換後的二階高級特徵」(500)進入「輸出層」,產生「原始輸出」(10)
「原始輸出」(10)經過Softmax函數轉換,判斷圖片是哪個類別
mx.set.seed(0)
model_1 = mx.model.FeedForward.create(lenet, X = Train.X.array, y = Train.Y,
ctx = mx.cpu(), num.round = 20, array.batch.size = 100,
learning.rate = 0.05, momentum = 0.9, wd = 0.00001,
eval.metric = mx.metric.accuracy,
epoch.end.callback = mx.callback.log.train.metric(100))
preds = predict(model_1, Test.X.array)
pred.label = max.col(t(preds)) - 1
tab = table(pred.label, Test.Y)
cat("Testing accuracy rate =", sum(diag(tab))/sum(tab))
## Testing accuracy rate = 0.9863095
print(tab)
## Test.Y
## pred.label 0 1 2 3 4 5 6 7 8 9
## 0 1650 0 1 3 1 2 1 0 2 6
## 1 0 1839 3 0 3 0 2 1 4 1
## 2 1 2 1639 8 0 0 1 24 7 0
## 3 0 0 3 1713 0 4 1 1 1 6
## 4 0 0 1 0 1591 0 2 3 2 21
## 5 0 2 0 8 0 1530 4 0 3 5
## 6 9 1 1 0 1 6 1648 0 3 0
## 7 0 4 4 0 3 1 0 1717 0 7
## 8 1 3 3 9 1 5 2 4 1652 5
## 9 2 0 1 1 6 3 0 3 1 1591
– 如果你的電腦核心夠多,可以用下面的指令
n.cpu <- 4
device.cpu <- lapply(0:(n.cpu-1), function(i) {mx.cpu(i)})
model_2 = mx.model.FeedForward.create(lenet, X = Train.X.array, y = Train.Y,
ctx = device.cpu, num.round = 1, array.batch.size = 100,
learning.rate = 0.05, momentum = 0.9, wd = 0.00001,
eval.metric = mx.metric.accuracy,
arg.params = model_1$arg.params,
epoch.end.callback = mx.callback.log.train.metric(100))
PARAMS = model_1$arg.params
ls(PARAMS)
## [1] "convolution0_bias" "convolution0_weight"
## [3] "convolution1_bias" "convolution1_weight"
## [5] "fullyconnected0_bias" "fullyconnected0_weight"
## [7] "fullyconnected1_bias" "fullyconnected1_weight"
原始圖片(28x28x1)要先經過20個5x5的「卷積器」(5x5x1x20)處理,將使圖片變成20張「一階特徵圖」(24x24x20)
接著這20張「一階特徵圖」(24x24x20)會經過ReLU,產生20張「轉換後的一階特徵圖」(24x24x20)
接著這20張「轉換後的一階特徵圖」(24x24x20)再經過2x2「池化器」(2x2)處理,將使圖片變成20張「降維後的一階特徵圖」(12x12x20)
再將20張「降維後的一階特徵圖」(12x12x20)經過50個5x5的「卷積器」(5x5x20x20)處理,將使圖片變成50張「二階特徵圖」(8x8x50)
接著這50張「二階特徵圖」(8x8x50)會經過ReLU,產生50張「轉換後的二階特徵圖」(8x8x50)
接著這50張「轉換後的二階特徵圖」(8x8x50)再經過2x2「池化器」(2x2)處理,將使圖片變成50張「降維後的二階特徵圖」(4x4x50)
將「降維後的二階特徵圖」(4x4x50)重新排列,壓製成「一階高級特徵」(800)
讓「一階高級特徵」(800)進入「隱藏層」,輸出「二階高級特徵」(500)
「二階高級特徵」(500)經過ReLU,輸出「轉換後的二階高級特徵」(500)
「轉換後的二階高級特徵」(500)進入「輸出層」,產生「原始輸出」(10)
「原始輸出」(10)經過Softmax函數轉換,判斷圖片是哪個類別
Input = Test.X.array[,,,1]
dim(Input) = c(28, 28, 1, 1)
preds = predict(model_1, Input)
pred.label = max.col(t(preds)) - 1
par(mar=rep(0,4))
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
img = as.raster(t(matrix(as.numeric(Input), nrow = 28)))
rasterImage(img, -0.04, -0.04, 1.04, 1.04, interpolate=FALSE)
text(0.05, 0.95, Test.Y[1], col = "green", cex = 2)
text(0.95, 0.95, pred.label, col = "blue", cex = 2)
– 在這裡我們介紹一個有趣的例子,利用卷積神經網路來做出院病歷摘要編碼,其效竟超過了SVM、Random forest等方法
– 我們能透過「word embedding」將文字轉為向量,他的主要目標是讓字義相似的字在向量空間中非常接近,我們能利用「word2vec」做到這點
– 這個檔案很大,我們可以利用套件「data.table」協助我們快速的讀取這樣的檔案
library(data.table)
library(magrittr)
library(dplyr)
library(plyr)
word.data = fread("data/glove.6B.50d.txt", header = FALSE)
##
Read 17.6% of 341218 rows
Read 29.3% of 341218 rows
Read 41.0% of 341218 rows
Read 52.8% of 341218 rows
Read 64.5% of 341218 rows
Read 76.2% of 341218 rows
Read 87.9% of 341218 rows
Read 99.6% of 341218 rows
Read 341218 rows and 51 (of 51) columns from 0.137 GB file in 00:00:12
words.ref = word.data %>% select(V1) %>% setDF %>% .[,1] %>% as.character
words.matrix = word.data %>% select(-V1) %>% setDF %>% as.matrix
rownames(words.matrix) = words.ref
dim(words.matrix)
## [1] 341218 50
– 接著,讓我們找看看最接近「adenocarcinoma」的字是哪些,使用餘弦值作為指標
word = "adenocarcinoma"
word.vector = words.matrix[which(words.ref==word),]
other.vectors = words.matrix[which(words.ref!=word),]
Dot_Product = other.vectors %*% word.vector
distance.word = sqrt(sum(word.vector^2))
distance.other = apply(other.vectors, 1, function(x) {sqrt(sum(x^2))})
cos_value = Dot_Product/distance.word/distance.other
cos_value = cos_value[order(cos_value[,1], decreasing = TRUE),]
head(cos_value, 10)
## carcinoma squamous esophageal carcinomas metastatic
## 0.8821906 0.7957962 0.7850613 0.7850401 0.7780124
## hepatocellular nasopharyngeal melanoma ovarian nodules
## 0.7483364 0.7315709 0.7205641 0.7195940 0.7167293
example = "Adenocarcinoma of stomach with peritoneal carcinomatosis and massive ascite, stage IV under bidirection chemotherapy (neoadjuvant intraperitoneal-systemic chemotherapy) with intraperitoneal paclitaxel 120mg (20151126, 20151201) and systemic with Oxalip (20151127) and oral XELOX."
text = tolower(example)
text = gsub("\n", "@@@@@", text, fixed = TRUE)
text = gsub("\r", "@@@@@", text, fixed = TRUE)
text = gsub("[ :,;-]", "@", text)
text = gsub("(", "@", text, fixed = TRUE)
text = gsub(")", "@", text, fixed = TRUE)
text = gsub("/", "@", text, fixed = TRUE)
text = strsplit(text, split = ".", fixed = TRUE)[[1]]
text = paste(text, collapse = "@@@@@")
text = strsplit(text, split = "@", fixed = TRUE)[[1]]
TEXT.ARRAY = matrix(0, nrow = length(text), ncol = 50)
for (i in 1:length(text)) {
if (text[i]!="") {
pos = which(words.ref == text[i])
if (length(pos)==1) {
TEXT.ARRAY[i,] = words.matrix[pos,]
}
}
}
library(imager)
img = TEXT.ARRAY
img[img>2] = 2
img[img<-2] = -2
plot(as.cimg(t(img)))
load("data/ICD10.RData")
Train.X.array = ARRAY[,,1:3000]
dim(Train.X.array) = c(100, 50, 1, 3000)
Train.Y = LABEL[1:3000]
Vald.X.array = ARRAY[,,3001:4000]
dim(Vald.X.array) = c(100, 50, 1, 1000)
Vald.Y = LABEL[3001:4000]
Test.X.array = ARRAY[,,4001:5000]
dim(Test.X.array) = c(100, 50, 1, 1000)
Test.Y = LABEL[4001:5000]
library(mxnet)
get_symbol_textcnn <- function(num_outcome = 1,
filter_sizes = 1:5,
num_filter = c(40, 30, 15, 10, 5),
Seq.length = 100,
word.dimation = 50,
dropout = 0.5) {
data <- mx.symbol.Variable('data')
concat_lst <- NULL
for (i in 1:length(filter_sizes)) {
convi <- mx.symbol.Convolution(data = data,
kernel = c(filter_sizes[i], word.dimation),
pad = c(filter_sizes[i]-1, 0),
num_filter = num_filter[i],
name = paste0('conv', i))
relui <- mx.symbol.Activation(data = convi,
act_type = "relu",
name = paste0('relu', i))
pooli <- mx.symbol.Pooling(data = relui,
pool_type = "max",
kernel = c(Seq.length + filter_sizes[i] - 1, 1),
stride = c(1, 1),
name = paste0('pool', i))
concat_lst = append(concat_lst, pooli)
}
concat_lst$num.args = length(filter_sizes)
h_pool = mxnet:::mx.varg.symbol.Concat(concat_lst)
# dropout layer
if (dropout > 0) {
h_drop = mx.symbol.Dropout(data = h_pool, p = dropout)
} else {
h_drop = h_pool
}
# fully connected layer
cls_weight = mx.symbol.Variable('cls_weight')
cls_bias = mx.symbol.Variable('cls_bias')
fc = mx.symbol.FullyConnected(data = h_drop,
weight = cls_weight,
bias = cls_bias,
num_hidden = num_outcome)
lr = mx.symbol.LogisticRegressionOutput(fc, name='lr')
return(lr)
}
my.eval.metric.CE <- mx.metric.custom(
name = "Cross-Entropy (CE)",
function(real, pred) {
real1 = as.numeric(real)
pred1 = as.numeric(pred)
pred1[pred1 <= 1e-6] = 1e-6
pred1[pred1 >= 1 - 1e-6] = 1 - 1e-6
return(-mean(real1 * log(pred1) + (1 - real1) * log(1 - pred1), na.rm = TRUE))
}
)
mx.callback.early.stop <- function(period, logger = NULL, small.value = "good", tolerance = 1e-4) {
function(iteration, nbatch, env, verbose) {
if (nbatch %% period == 0 && !is.null(env$metric)) {
result <- env$metric$get(env$train.metric)
if (nbatch != 0) {
if(verbose) {cat(paste0("Batch [", nbatch, "] Train-", result$name, "=", result$value, "\n"))}
}
if (!is.null(logger)) {
if (class(logger) != "mx.metric.logger") {
stop("Invalid mx.metric.logger.")
} else {
logger$train <- c(logger$train, result$value)
if (!is.null(env$eval.metric)) {
result <- env$metric$get(env$eval.metric)
if (nbatch != 0) {cat(paste0("Batch [", nbatch, "] Validation-", result$name, "=", result$value, "\n"))}
logger$eval <- c(logger$eval, result$value)
}
}
}
}
if (!is.null(env$metric)) {
if (length(logger$train) >= 10) {
if (!is.null(env$eval.metric)) {TEST.VALUE = round(logger$eval/tolerance)} else {TEST.VALUE = round(logger$train/tolerance)}
if (small.value=="good") {
if (mean(tail(TEST.VALUE, 10)) <= mean(tail(TEST.VALUE, 5))) {return(FALSE)}
} else {
if (mean(tail(TEST.VALUE, 10)) >= mean(tail(TEST.VALUE, 5))) {return(FALSE)}
}
}
}
return(TRUE)
}
}
mx.set.seed(0)
logger = mx.metric.logger$new()
cnn.model = mx.model.FeedForward.create(get_symbol_textcnn(),
X = Train.X.array, y = Train.Y,
eval.data = list(data = Vald.X.array, label = Vald.Y),
ctx = mx.cpu(), num.round = 100,
array.batch.size = 100, learning.rate = 0.05,
momentum = 0.9, wd = 0.00001,
eval.metric = my.eval.metric.CE,
epoch.end.callback = mx.callback.early.stop(100, logger, small.value = "good"))
pred.prob = predict(cnn.model, Test.X.array)
pred.y = pred.prob>0.5
tab = table(Test.Y, pred.y)
cat("Testing accuracy rate =", sum(diag(tab))/sum(tab))
## Testing accuracy rate = 0.943
print(tab)
## pred.y
## Test.Y FALSE TRUE
## 0 663 31
## 1 26 280
– 但要注意的是,由於訓練「cnn.model」時所用的輸入是資料維度是(100, 50, 1, n),故你的字數就算沒有達到100也許要補空格到100喔!
1. Adenocarcinoma of stomach with peritoneal carcinomatosis and massive ascite, stage IV under bidirection chemotherapy (neoadjuvant intraperitoneal-systemic chemotherapy) with intraperitoneal paclitaxel 120mg (20151126, 20151201) and systemic with Oxalip (20151127) and oral XELOX.
2. Chronic kidney disease, stage V with pulmonary edema underwent emergent hemodialysis, status post arteriovenous graft creation with maintenance hemodialysis.
– 相較於類神經網路,卷積神經網路成功的原因大多認為在卷積層的參數共享。
– 卷積神經網路不只能用來識別影像,同樣也能用來識別文件、語音等,只要能將他們的資料結構化為2維抽象圖就能使用卷積神經網路。